Berghanel et al. (2016) wanted to understand the role of prenatal maternal stress on the development phenotype of offspring, as well as if food availabiliy (and therefore food intake) impacts stress on pregnant mothers. They propose two ways that maternal stress could effect offspring phenotypes:
Developmental restraints hypothesis: High prenatal stress due to lack of food availability will lead to offspring reducing their investment into development to reduce the chances of starvation.
Predictive adaptive response (PARs): High prenatal stress due to lack of food availability will lead to accelerated growth in offspring in preparation for the likelihood of a reduced lifespan in adverse conditions.
The authors wanted to test these hypotheses in natural conditions, so they did so in a group of Assamese macaques (Macaca assamensis) that have previously shown correlations between low food availability and low growth rates and reduced social play. The group inhabits the highly variable forest of Phu Khieo Wildlife Sanctuary in Thailand.
Maternal stress was measured through elevation of prenatal glucocorticoid levels (preGC) in fecal samples of the macaques. Behavioral data, such as offspring social play and variables describing maternal style, were collected via 30 minute focal observations at one minute intervals. Furthermore, monthly values for offspring size and growth rate were estimated with photogrammetry of the length of lower arm from birth to the end of the study. As for environmental conditions potentially inducing stress, food availabity was calculated using fruit abundance and tree density during months before, during and after the mother’s gestation period. All variables, except for body size index, PreGC and PostGC, were z-transformed. I have separated the data relevant for each model into their own csv file, and each has been individually uploaded into the reposity along with the original excel file the data came in.
The authors predicted that maternal food availability is negatively correlated to PreGC levels. However, they expect that this could also be associated with maternal rank and offspring sex, as these may impact how much food a female receives or needs, respectively. The two alternative hypotheses that are tested reflect the PAR and developmental constraint (DC) hypotheses:
Offspring phenotype, however, may be mediated by food availability, offspring sex, maternal care-taking, maternal stress during lactation, and energy uptake (food availability post-gestation) and use (offspring social play). Therefore, the authors “controlled” for these variables.
I will be attempting to recreate the first four models (three generalized least-squares models (GLS) and one linear model) of this study, as well as a PCA on maternal style. The authors control for certain variables by adding them into their models, and check the each models’ residual distribution for normality.
Figure 1 I will be reproducing models 1-4 and a PCA (not shown). “Causes and consequences of maternal physiological stress. Red, females; blue, males. Values in brackets: reduced model after exclusion of 6 the collinear control variable(s) (see text). Superscript 1 in the artwork denotes model residuals (partial regression plot). All fixed effects were z-transformed. Sex: male/female 1⁄4 0/1. (a) Prenatal food availability predicted gestational maternal GC level (PreGC) (model 1, GLS, response variable: PreGC (individual samples, log-transformed), grouping variable: mother ID; $ on the day the GC in the faecal sample were produced (‘present’) or during the three month leading up to the sampling day (‘before’). (b) Postnatal maternal GC level (PostGC) and rejectiveness, and by trend also protectiveness, were independently related to PreGC (model 2, LM, response variable: average PreGC during gestation). (c) PreGC during the first and second gestational trimester predicted postnatal growth rates (model 3, GLS, response variable: monthly body size index, grouping variable: infant ID; from birth until age of separate measurement). We report the main effect for age only because all other main effects do not inform the research question. Chart: the interaction between age and early-to-mid-gestational PreGC of the reduced model is plotted (i.e. the influence of PreGC on the estimate of age; shaded: 95% confidence interval; package: interplot [66]). (d) Body size at the age of 16–18 months was predicted by early-to-mid-gestational PreGC (model 4, GLS, response variable: body size indices at 16 – 18 months of age, grouping variable: infant ID; from birth until age of separate measurement).” (Berghanel et al. 2016).
Let’s do it…
Here are the packages that will be used in this analysis:
Here the authors test for their first hypothesis, that stress is induced by low food availability. To do this, they ran a GLS model with PreGC as the dependent variable and food availability as the independent variable.
Get the data. For each data upload, I wil be using the curl package to get it from my GitHub Repo:
library(curl)
f <- curl("https://raw.githubusercontent.com/MelZarate/mazarate-Berghanel2016-replication-assignment/master/model_1.csv")
mod_1 <- read.csv(f, header = TRUE, sep = ",", stringsAsFactors = TRUE)
head(mod_1)
## PreGC ln.PreGC. Gestation.day nPrenatal.FA.present
## 1 180.2105 5.194 8 1.738
## 2 135.2727 4.907 10 1.732
## 3 164.4444 5.103 14 1.721
## 4 131.0345 4.875 29 1.560
## 5 142.2222 4.957 60 0.968
## 6 256.6667 5.548 69 0.762
## nPrenatal.FA.before Maternal.rank Sex.of.the.offspring Year.of.birth
## 1 1.614 -1.07 -1.019 -0.439
## 2 1.597 -1.07 -1.019 -0.439
## 3 1.559 -1.07 -1.019 -0.439
## 4 1.398 -1.07 -1.019 -0.439
## 5 1.086 -1.07 -1.019 -0.439
## 6 0.973 -1.07 -1.019 -0.439
## Gestation.day.1 Day.time Mother.ID
## 1 -1.522 -1.692 Whi
## 2 -1.480 0.507 Whi
## 3 -1.395 -0.224 Whi
## 4 -1.079 -1.082 Whi
## 5 -0.424 0.221 Whi
## 6 -0.234 1.581 Whi
summary(mod_1)
## PreGC ln.PreGC. Gestation.day nPrenatal.FA.present
## Min. : 39.02 Min. :3.664 Min. : 0.0 Min. :-1.6980000
## 1st Qu.:111.03 1st Qu.:4.710 1st Qu.: 36.0 1st Qu.:-0.9860000
## Median :148.62 Median :5.001 Median : 77.0 Median : 0.4120000
## Mean :164.54 Mean :5.003 Mean : 80.1 Mean :-0.0000304
## 3rd Qu.:198.04 3rd Qu.:5.288 3rd Qu.:122.0 3rd Qu.: 0.7110000
## Max. :512.54 Max. :6.239 Max. :163.0 Max. : 1.7470000
##
## nPrenatal.FA.before Maternal.rank Sex.of.the.offspring
## Min. :-1.9670000 Min. :-1.4950000 Min. :-1.0190000
## 1st Qu.:-0.8670000 1st Qu.:-0.8580000 1st Qu.:-1.0190000
## Median : 0.0890000 Median :-0.0080000 Median : 0.9780000
## Mean :-0.0000405 Mean :-0.0000574 Mean :-0.0002601
## 3rd Qu.: 0.9320000 3rd Qu.: 0.8420000 3rd Qu.: 0.9780000
## Max. : 1.6440000 Max. : 1.4790000 Max. : 0.9780000
##
## Year.of.birth Gestation.day.1 Day.time Mother.ID
## Min. :-0.4390000 Min. :-1.6910000 Min. :-1.8640 Wea : 33
## 1st Qu.:-0.4390000 1st Qu.:-0.9310000 1st Qu.:-0.7672 Naa : 29
## Median :-0.4390000 Median :-0.0650000 Median :-0.0525 Pai : 27
## Mean : 0.0001351 Mean :-0.0000101 Mean : 0.0000 Pat : 27
## 3rd Qu.:-0.4390000 3rd Qu.: 0.8840000 3rd Qu.: 0.6995 Stu : 25
## Max. : 2.2690000 Max. : 1.7500000 Max. : 4.7340 Tha : 22
## (Other):133
The data here includes PreGC, gestatio day (in days after conception), prenatal food availability (FA) presently and before birth, maternal rank, offspring sex, year of birth, gestation day, time of day, and mother ID. All variables besides mother ID and preGC were z-transformed.
As I mentioned before, the authors checked the residuals of each model for normal distribution. However, I am going to start just by checking out the distribution of the dependent and independent variables for this model.
PreGC:
hist(mod_1$PreGC,probability=TRUE, main="Histogram of normal
data",xlab="Approximately normally distributed data")
qqnorm(mod_1$PreGC,main="Normal QQ plot random normal variables (height)")
qqline(mod_1$PreGC,col="gray")
Food Availabilty:
hist(mod_1$nPrenatal.FA.before,probability=TRUE, main="Histogram of normal
data",xlab="Approximately normally distributed data")
qqnorm(mod_1$nPrenatal.FA.before,main="Normal QQ plot random normal variables (height)")
qqline(mod_1$nPrenatal.FA.before,col="gray")
These both look pretty far from normal, so let’s trying looking at the log-transformed variables.
hist(log(mod_1$PreGC),probability=TRUE, main="Histogram of normal
data",xlab="Approximately normally distributed data")
qqnorm(log(mod_1$PreGC),main="Normal QQ plot random normal variables (height)")
qqline(log(mod_1$PreGC),col="gray")
That looks much better!
hist(log(mod_1$nPrenatal.FA.before),probability=TRUE, main="Histogram of normal
data",xlab="Approximately normally distributed data")
## Warning in log(mod_1$nPrenatal.FA.before): NaNs produced
qqnorm(log(mod_1$nPrenatal.FA.before),main="Normal QQ plot random normal variables (height)")
## Warning in log(mod_1$nPrenatal.FA.before): NaNs produced
qqline(log(mod_1$nPrenatal.FA.before),col="gray")
## Warning in log(mod_1$nPrenatal.FA.before): NaNs produced
So this doesn’t look great, but the authors still used it so I’m going to roll with it. I will continue to check if the residuals of the model are normally distributed as I go on.
Build the model:
The authors use a generalized least-squares model to look at the relationship between PreGC and Food availability.
library(nlme) #this has the gls function
m1 <- gls(PreGC ~ nPrenatal.FA.before, data = mod_1) #first looking at the variables before they are log-transformed
plot(m1)
summary(m1)
## Generalized least squares fit by REML
## Model: PreGC ~ nPrenatal.FA.before
## Data: mod_1
## AIC BIC logLik
## 3342.495 3353.546 -1668.248
##
## Coefficients:
## Value Std.Error t-value p-value
## (Intercept) 164.53698 4.018242 40.94751 0
## nPrenatal.FA.before -33.04241 4.024981 -8.20933 0
##
## Correlation:
## (Intr)
## nPrenatal.FA.before 0
##
## Standardized residuals:
## Min Q1 Med Q3 Max
## -1.8952489 -0.6946414 -0.1825968 0.4562914 4.7657596
##
## Residual standard error: 69.13244
## Degrees of freedom: 296 total; 294 residual
Let’s check the residuals for normal distributions, as the authors say that they did:
hist(residuals(m1))
qqnorm(residuals(m1),main="Normal QQ plot random normal variables (height)")
qqline(residuals(m1),col="gray")
Doesn’t look very normally distributed, so that must be why they used the log-transformed version of the PreGC variable:
m1.log <- gls(log(PreGC) ~ nPrenatal.FA.before, data = mod_1)
plot(m1.log)
summary(m1.log)
## Generalized least squares fit by REML
## Model: log(PreGC) ~ nPrenatal.FA.before
## Data: mod_1
## AIC BIC logLik
## 316.1433 327.194 -155.0716
##
## Coefficients:
## Value Std.Error t-value p-value
## (Intercept) 5.002592 0.02337676 213.99853 0
## nPrenatal.FA.before -0.207810 0.02341597 -8.87473 0
##
## Correlation:
## (Intr)
## nPrenatal.FA.before 0
##
## Standardized residuals:
## Min Q1 Med Q3 Max
## -3.01626312 -0.69919161 0.01150996 0.69350339 2.78528232
##
## Residual standard error: 0.402189
## Degrees of freedom: 296 total; 294 residual
#check the residuals:
hist(residuals(m1.log))
qqnorm(residuals(m1.log),main="Normal QQ plot random normal variables (height)")
qqline(residuals(m1.log),col="gray")
Distribution looks great! The explanatory value for food availability resulting from the GLS is a bit off (-.2 compared to the -.16 that the authors found).
However, the authors say that they controlled for maternal rank, offspring sex, year of birth, time and day of gestation, and yet they have estimated values and statistics for each variable, which means that they were included in the model. The dialogue is a bit confusing, because they must have included them in the model to get these values, yet they even say in the figure caption “reduced model after exclusion of the collinear control variables.”
Since they would have to be included to have the estimates in the output, I will include them in the model:
m1.log.controlled <- gls(log(PreGC) ~ nPrenatal.FA.before + Maternal.rank + Sex.of.the.offspring + Year.of.birth + Gestation.day.1 + Day.time, data = mod_1) #including the other variables in the model
plot(m1.log.controlled)
summary(m1.log.controlled)
## Generalized least squares fit by REML
## Model: log(PreGC) ~ nPrenatal.FA.before + Maternal.rank + Sex.of.the.offspring + Year.of.birth + Gestation.day.1 + Day.time
## Data: mod_1
## AIC BIC logLik
## 341.1117 370.4431 -162.5558
##
## Coefficients:
## Value Std.Error t-value p-value
## (Intercept) 5.002587 0.02309279 216.62986 0.0000
## nPrenatal.FA.before -0.154728 0.04339898 -3.56526 0.0004
## Maternal.rank 0.031288 0.02387422 1.31054 0.1911
## Sex.of.the.offspring -0.016563 0.02376618 -0.69692 0.4864
## Year.of.birth 0.037823 0.03165326 1.19492 0.2331
## Gestation.day.1 0.040600 0.03618236 1.12210 0.2628
## Day.time 0.068325 0.02329437 2.93312 0.0036
##
## Correlation:
## (Intr) nP.FA. Mtrnl. Sx.f.. Yr.f.b Gst..1
## nPrenatal.FA.before 0.000
## Maternal.rank 0.000 0.181
## Sex.of.the.offspring 0.000 -0.169 -0.161
## Year.of.birth 0.000 0.674 0.074 -0.121
## Gestation.day.1 0.000 0.758 0.198 -0.185 0.449
## Day.time 0.000 -0.055 0.013 0.015 -0.019 -0.106
##
## Standardized residuals:
## Min Q1 Med Q3 Max
## -3.26477166 -0.64520350 0.01845055 0.71823611 2.76028875
##
## Residual standard error: 0.3973033
## Degrees of freedom: 296 total; 289 residual
#Check the residuals, just for the sake of consistency:
hist(residuals(m1.log.controlled))
qqnorm(residuals(m1.log.controlled),main="Normal QQ plot random normal variables (height)")
qqline(residuals(m1.log.controlled),col="gray")
Now all values are very close to what the authors show, and the residuals are normally distributed. But I’m still not too sure why they would include these into the model. I am going to try to actually control for these extra variables by regressing them out of the model. Here I am controlling for maternal rank by regressing it out of our dependent variable. When you do this, you regress the variable you want to control for against your variable of interest, and use the residuals as your new variable of interest:
#First I will regress out maternal rank of our dependent variable by modelling them against each other and assigning the residuals of that model to an object within the original data.
mod_1$PreGC_ctrl1<-resid(gls(PreGC~Maternal.rank, data=mod_1))
mod_1$PreGC_ctrl2<-resid(gls(PreGC_ctrl1~Sex.of.the.offspring,data=mod_1)) #now do this until I have regressed out each control variable.
hist(mod_1$PreGC_ctrl2) #looks fine so far
mod_1$PreGC_ctrl3<-resid(gls(PreGC_ctrl2~Year.of.birth,data=mod_1))
mod_1$PreGC_ctrl4<-resid(gls(PreGC_ctrl3~Gestation.day.1,data=mod_1))
mod_1$PreGC_ctrl5<-resid(gls(PreGC_ctrl4~Day.time,data=mod_1))
So now the PreGC variable that has all of the variables controlled for is PreGC_ctrl5. Let’s see what different this has in the model against food availability:
m1.cntrl <- gls(PreGC_ctrl5 ~ nPrenatal.FA.before, data = mod_1)
plot(m1.cntrl)
summary(m1.cntrl)
## Generalized least squares fit by REML
## Model: PreGC_ctrl5 ~ nPrenatal.FA.before
## Data: mod_1
## AIC BIC logLik
## 3331.972 3343.023 -1662.986
##
## Coefficients:
## Value Std.Error t-value p-value
## (Intercept) -0.000146 3.946966 -0.0000369 1.000
## nPrenatal.FA.before -3.594154 3.953586 -0.9090871 0.364
##
## Correlation:
## (Intr)
## nPrenatal.FA.before 0
##
## Standardized residuals:
## Min Q1 Med Q3 Max
## -2.1891655 -0.6826009 -0.1303116 0.4949168 4.8449784
##
## Residual standard error: 67.90617
## Degrees of freedom: 296 total; 294 residual
So the plot doesn’t look quite as nice as before, and the values now are way off and insignificant. However, remember that they used the log transformed PreGC variable, and I didn’t here so that’s probably why that happened. For some reason, the gls() function is picky about which variables are log-transformed, and it didn’t want to run when I transformed the PreGC controlled for all of the other variables.
Now that I have those values, I am going to recreate the plot that they made showing the correlation between PreGC and food availability, with mother ID as a grouping variable.
library(ggplot2)
ggplot(data=mod_1,aes(y=log(PreGC),x=nPrenatal.FA.before))+geom_point(data=mod_1,shape=21,aes(fill=Mother.ID))+stat_smooth(method="lm")+theme_bw()
I am also going to check the R-squared value using the piecewiseSEM package, as the authors did.
library(piecewiseSEM)
##
## This is piecewiseSEM version 2.1.0.
##
##
## Questions or bugs can be addressed to <LefcheckJ@si.edu>.
rsquared(m1.log.controlled) #finding it for the model including other variables
## Response family link method R.squared
## 1 PreGC gaussian identity none 0.2434256
Looks like my R squared is just a little higher than the authors’, but pretty close! This is showing that PreGC is negatively correlated with prenatal food availability. We can infer from this that food shortages that result in reduced maternal physiological condition are associated with physiological stress.
Here, the authors wanted to analyze correlations between stress during gestation (PreGC) and postnatal maternal caretaking (maternal protectiveness and rejectiveness) and stress (PostGC).
Get the data:
f <- curl("https://raw.githubusercontent.com/MelZarate/mazarate-Berghanel2016-replication-assignment/master/model_2.csv")
mod_2 <- read.csv(f, header = TRUE, sep = ",", stringsAsFactors = TRUE)
head(mod_2)
## PreGC Maternal_protectiveness Maternal_rejectiveness PostGC
## 1 172.8307 -1.568 0.180 1.277
## 2 163.1227 -0.210 0.621 -0.549
## 3 147.4152 0.187 0.216 -0.316
## 4 153.2711 2.435 -1.300 -0.217
## 5 177.7823 0.714 1.242 -0.277
## 6 134.9228 -0.047 0.299 -0.503
## Postnatal_FA Sex_of_offspring
## 1 -0.647 -1.029
## 2 -0.487 -1.029
## 3 -0.624 -1.029
## 4 -0.585 -1.029
## 5 -0.671 -1.029
## 6 -0.677 -1.029
summary(mod_2)
## PreGC Maternal_protectiveness Maternal_rejectiveness
## Min. : 85.66 Min. :-1.968 Min. :-1.5820000
## 1st Qu.:149.73 1st Qu.:-0.327 1st Qu.:-0.4650000
## Median :167.92 Median :-0.047 Median : 0.1800000
## Mean :173.84 Mean : 0.000 Mean :-0.0000588
## 3rd Qu.:184.02 3rd Qu.: 0.187 3rd Qu.: 0.6210000
## Max. :245.77 Max. : 2.435 Max. : 2.1430000
## PostGC Postnatal_FA Sex_of_offspring
## Min. :-1.5670000 Min. :-0.6770000 Min. :-1.0290000
## 1st Qu.:-0.5490000 1st Qu.:-0.6650000 1st Qu.:-1.0290000
## Median :-0.2770000 Median :-0.5850000 Median : 0.9150000
## Mean : 0.0001176 Mean : 0.0001176 Mean : 0.0001765
## 3rd Qu.: 0.2760000 3rd Qu.: 1.1840000 3rd Qu.: 0.9150000
## Max. : 2.6140000 Max. : 1.6650000 Max. : 0.9150000
The data here incldues PreGC, maternal protectiveness and rejectiveness, PostGC, postnatal FA and offspring sex. They tested for these correlations by running a simple linear model including these variables, so I’ll start with just that:
m2 <- lm(data = mod_2, PreGC ~ Maternal_protectiveness + Maternal_rejectiveness + PostGC)
plot(m2)
summary(m2)
##
## Call:
## lm(formula = PreGC ~ Maternal_protectiveness + Maternal_rejectiveness +
## PostGC, data = mod_2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -35.899 -26.426 -8.669 18.775 76.784
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 173.838 8.913 19.505 5.21e-11 ***
## Maternal_protectiveness 11.122 9.251 1.202 0.2507
## Maternal_rejectiveness 15.857 9.189 1.726 0.1081
## PostGC 17.410 9.253 1.882 0.0825 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 36.75 on 13 degrees of freedom
## Multiple R-squared: 0.3611, Adjusted R-squared: 0.2137
## F-statistic: 2.449 on 3 and 13 DF, p-value: 0.11
hist(residuals(m2)) #plot the residuals
qqnorm(residuals(m2),main="Normal QQ plot random normal variables (height)")
qqline(residuals(m2),col="gray")
Well that looks messy. It is interesting because the authors did not mention taking the log-transformation of any variable, even though they did for PreGC in the last model, and, as we saw earlier, the log(PreGC) has a normal distribution. I’m just going to try it out of curiosity.
library(car) #this time I am going to use the qqPlot function to look at the residuals.
## Loading required package: carData
m2.log <- lm(data = mod_2, log(PreGC) ~ Maternal_protectiveness + Maternal_rejectiveness + PostGC)
plot(m2.log)
summary(m2.log)
##
## Call:
## lm(formula = log(PreGC) ~ Maternal_protectiveness + Maternal_rejectiveness +
## PostGC, data = mod_2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.26273 -0.14274 -0.02335 0.16071 0.41166
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.12952 0.05159 99.420 <2e-16 ***
## Maternal_protectiveness 0.08472 0.05355 1.582 0.1376
## Maternal_rejectiveness 0.10566 0.05319 1.986 0.0685 .
## PostGC 0.10837 0.05356 2.023 0.0641 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2127 on 13 degrees of freedom
## Multiple R-squared: 0.4271, Adjusted R-squared: 0.2949
## F-statistic: 3.23 on 3 and 13 DF, p-value: 0.0576
hist(residuals(m2.log)) #plot the residuals
qqPlot(m2.log$residuals)
## [1] 9 8
The plots here are still pretty messy. I’m going to go on with what the authors did without the log-transformation. Again, they say that they controlled for certain variables in this analysis: postnatal food availability and sex of the offspring. This makes sense, because we wouldn’t want offspring sex and the amount of food available to have an impact on the correlation between stress that the mother endures before and after birth. However, estimates are also given for these two variables, meaning that they must have actually been included in the model.
Here’s what that would look like:
m2.cntrl <- lm(data = mod_2, PreGC ~ Maternal_protectiveness + Maternal_rejectiveness +
PostGC + Postnatal_FA + Sex_of_offspring)
plot(m2.cntrl)
summary(m2.cntrl)
##
## Call:
## lm(formula = PreGC ~ Maternal_protectiveness + Maternal_rejectiveness +
## PostGC + Postnatal_FA + Sex_of_offspring, data = mod_2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -35.886 -10.575 -0.654 9.229 34.527
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 173.834 4.975 34.941 1.27e-12 ***
## Maternal_protectiveness 11.440 5.228 2.188 0.051148 .
## Maternal_rejectiveness 15.711 5.650 2.781 0.017878 *
## PostGC 19.127 5.620 3.403 0.005896 **
## Postnatal_FA 28.177 5.196 5.423 0.000209 ***
## Sex_of_offspring 2.985 6.147 0.486 0.636754
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 20.51 on 11 degrees of freedom
## Multiple R-squared: 0.8316, Adjusted R-squared: 0.755
## F-statistic: 10.86 on 5 and 11 DF, p-value: 0.0005821
hist(residuals(m2.cntrl)) #plot the residuals
qqPlot(m2.cntrl$residuals)
## [1] 11 13
In terms of the model plot and distribution of the residuals, this looks a lot better (keeping in mind the relatively low sample size of 17 here). Also the estimates and R squared value are spot on to what the authors found.
Going forward, the they did not give any plots to what the correlation between these variables, but if they would look like:
ggplot(data=mod_2,aes(y=PreGC,x=PostGC))+geom_point(data=mod_2,shape=21)+stat_smooth(method="lm")+theme_bw() #postGC
ggplot(data=mod_2,aes(y=PreGC,x=Maternal_protectiveness))+geom_point(data=mod_2,shape=21)+stat_smooth(method="lm")+theme_bw() #maternal protectiveness after giving birth
ggplot(data=mod_2,aes(y=PreGC,x=Maternal_rejectiveness))+geom_point(data=mod_2,shape=21)+stat_smooth(method="lm")+theme_bw() #maternal rejectiveness after giving birth
I can see why they didn’t give them- there seems to be a wide range of confidence with some points outside. Again, this may be due to the small sample size. It does, however, show that PreGC is positively correlated to maternal rejectiveness and PostGC. What the authors fail to discus is the slight positive correlation between PreGC and protectiveness.
This time, the authors attempt to quantify how offspring body size can be predicted by average PreGC during gestation and age at measurement, controlling for the interactions between age and the control variables.
Get the data:
f <- curl("https://raw.githubusercontent.com/MelZarate/mazarate-Berghanel2016-replication-assignment/master/model_3.csv")
mod_3 <- read.csv(f, header = TRUE, sep = ",", stringsAsFactors = TRUE)
head(mod_3)
## Body.size.index Age Age.1 Early.mid.gestation.PreGC
## 1 334.427 30.727 -1.627 1.775
## 2 390.210 55.154 -1.481 1.775
## 3 465.602 81.000 -1.327 1.775
## 4 675.121 116.588 -1.114 1.775
## 5 807.918 148.667 -0.923 1.775
## 6 750.874 176.000 -0.760 1.775
## PreGC.throughout.gestation Early.gestation.PreGC Mid.gestation.PreGC
## 1 1.965 2.477 0.616
## 2 1.965 2.477 0.616
## 3 1.965 2.477 0.616
## 4 1.965 2.477 0.616
## 5 1.965 2.477 0.616
## 6 1.965 2.477 0.616
## Late.gestation.PreGC Early.mid.gestation.FA FA.throughout.gestation
## 1 1.831 -1.625 -0.83
## 2 1.831 -1.625 -0.83
## 3 1.831 -1.625 -0.83
## 4 1.831 -1.625 -0.83
## 5 1.831 -1.625 -0.83
## 6 1.831 -1.625 -0.83
## nPostnatal.FA Maternal.protectiveness Maternal.rejectiveness
## 1 2.294 -0.105 2.25
## 2 2.393 -0.105 2.25
## 3 2.637 -0.105 2.25
## 4 2.919 -0.105 2.25
## 5 2.902 -0.105 2.25
## 6 2.834 -0.105 2.25
## nSocial.play....of.time. Sex.of.the.offspring PostGC Year.of.birth
## 1 -2.181 -1.029 -0.9 2.716
## 2 -1.922 -1.029 -0.9 2.716
## 3 -1.026 -1.029 -0.9 2.716
## 4 0.110 -1.029 -0.9 2.716
## 5 0.715 -1.029 -0.9 2.716
## 6 2.671 -1.029 -0.9 2.716
## Infant.ID
## 1 aug
## 2 aug
## 3 aug
## 4 aug
## 5 aug
## 6 aug
summary(mod_3)
## Body.size.index Age Age.1
## Min. : 203.7 Min. : 30.73 Min. :-1.6270000
## 1st Qu.: 602.0 1st Qu.:154.31 1st Qu.:-0.8895000
## Median : 849.7 Median :297.00 Median :-0.0380000
## Mean : 882.1 Mean :303.36 Mean : 0.0000132
## 3rd Qu.:1152.6 3rd Qu.:441.00 3rd Qu.: 0.8210000
## Max. :1705.6 Max. :635.75 Max. : 1.9830000
##
## Early.mid.gestation.PreGC PreGC.throughout.gestation
## Min. :-2.139000 Min. :-2.2570000
## 1st Qu.:-0.427000 1st Qu.:-0.4590000
## Median :-0.110000 Median :-0.0020000
## Mean : 0.000035 Mean : 0.0000352
## 3rd Qu.: 0.238000 3rd Qu.: 0.2810000
## Max. : 3.276000 Max. : 2.4040000
##
## Early.gestation.PreGC Mid.gestation.PreGC Late.gestation.PreGC
## Min. :-2.02700 Min. :-1.477000 Min. :-1.8520000
## 1st Qu.:-0.56600 1st Qu.:-0.856000 1st Qu.:-0.7570000
## Median :-0.12600 Median : 0.046000 Median :-0.0770000
## Mean : 0.00004 Mean :-0.000189 Mean :-0.0000749
## 3rd Qu.: 0.36400 3rd Qu.: 0.408000 3rd Qu.: 0.8040000
## Max. : 3.44100 Max. : 3.378000 Max. : 2.6780000
##
## Early.mid.gestation.FA FA.throughout.gestation nPostnatal.FA
## Min. :-1.9260000 Min. :-1.7940000 Min. :-1.2420000
## 1st Qu.:-0.8020000 1st Qu.:-0.7830000 1st Qu.:-0.6470000
## Median : 0.1050000 Median : 0.0750000 Median :-0.3750000
## Mean :-0.0000837 Mean :-0.0000044 Mean : 0.0000441
## 3rd Qu.: 1.0080000 3rd Qu.: 0.8900000 3rd Qu.: 0.4260000
## Max. : 1.4990000 Max. : 1.8540000 Max. : 2.9750000
##
## Maternal.protectiveness Maternal.rejectiveness nSocial.play....of.time.
## Min. :-1.8540000 Min. :-1.664000 Min. :-2.181000
## 1st Qu.:-0.3080000 1st Qu.:-1.194000 1st Qu.:-0.371000
## Median :-0.0340000 Median : 0.226000 Median : 0.056000
## Mean :-0.0000617 Mean :-0.000022 Mean :-0.000022
## 3rd Qu.: 0.4370000 3rd Qu.: 0.651000 3rd Qu.: 0.520000
## Max. : 2.3160000 Max. : 2.250000 Max. : 2.671000
##
## Sex.of.the.offspring PostGC Year.of.birth
## Min. :-1.0290000 Min. :-1.3820000 Min. :-0.3670000
## 1st Qu.:-1.0290000 1st Qu.:-0.6360000 1st Qu.:-0.3670000
## Median : 0.9670000 Median :-0.3280000 Median :-0.3670000
## Mean :-0.0002247 Mean :-0.0000705 Mean :-0.0002996
## 3rd Qu.: 0.9670000 3rd Qu.: 0.3980000 3rd Qu.:-0.3670000
## Max. : 0.9670000 Max. : 2.9070000 Max. : 2.7160000
##
## Infant.ID
## sin : 19
## gar : 18
## tao : 18
## fin : 17
## hop : 17
## kim : 17
## (Other):121
This data includes every variable that has been used thus far, along with infant ID, birth year, percent of time playing socially, FA throughout gestation, early-mid gestation FA, PreGC at different points of gestation (early, mid, late), body size index and age (in days and z-transformed; I will use z-transformed).
Build the model:
m3 <- gls(Body.size.index ~ Age.1 + (Early.mid.gestation.PreGC * Age.1) * (Early.mid.gestation.FA * Age.1) + (nPostnatal.FA * Age.1) + (Maternal.protectiveness * Age.1) + (Maternal.rejectiveness * Age.1) + (nSocial.play....of.time. * Age.1) + (Sex.of.the.offspring * Age.1) + (Year.of.birth * Age.1) + (PostGC * Age.1), data = mod_3) #Since we have been adding all the "controls" into the model, I am adding each of the controls' interactions with age in the model here
plot(m3)
summary(m3) #to see all of the estimate values of each control interactions, you have to go to the m1 object in the environment and look at the coefficients.
## Generalized least squares fit by REML
## Model: Body.size.index ~ Age.1 + (Early.mid.gestation.PreGC * Age.1) * (Early.mid.gestation.FA * Age.1) + (nPostnatal.FA * Age.1) + (Maternal.protectiveness * Age.1) + (Maternal.rejectiveness * Age.1) + (nSocial.play....of.time. * Age.1) + (Sex.of.the.offspring * Age.1) + (Year.of.birth * Age.1) + (PostGC * Age.1)
## Data: mod_3
## AIC BIC logLik
## 2576.01 2652.44 -1265.005
##
## Coefficients:
## Value Std.Error
## (Intercept) 899.8834 11.28666
## Age.1 347.6220 20.36299
## Early.mid.gestation.PreGC 9.6144 21.11096
## Early.mid.gestation.FA -0.2587 11.03293
## nPostnatal.FA 59.6528 31.11646
## Maternal.protectiveness -4.4985 10.10448
## Maternal.rejectiveness -5.3460 8.82104
## nSocial.play....of.time. 1.5251 11.86201
## Sex.of.the.offspring 23.6705 8.53218
## Year.of.birth 28.5197 41.89121
## PostGC 9.4501 16.71380
## Age.1:Early.mid.gestation.PreGC 29.5030 22.16923
## Age.1:Early.mid.gestation.FA -27.8736 11.73771
## Early.mid.gestation.PreGC:Early.mid.gestation.FA 19.6551 8.66674
## Age.1:nPostnatal.FA 63.6426 14.93546
## Age.1:Maternal.protectiveness -1.9328 10.19048
## Age.1:Maternal.rejectiveness -15.2483 9.22975
## Age.1:nSocial.play....of.time. -20.4686 9.75384
## Age.1:Sex.of.the.offspring 26.1290 8.92136
## Age.1:Year.of.birth -28.9930 25.29207
## Age.1:PostGC 1.7175 17.01704
## Age.1:Early.mid.gestation.PreGC:Early.mid.gestation.FA 17.7137 8.18354
## t-value p-value
## (Intercept) 79.72984 0.0000
## Age.1 17.07126 0.0000
## Early.mid.gestation.PreGC 0.45542 0.6493
## Early.mid.gestation.FA -0.02344 0.9813
## nPostnatal.FA 1.91708 0.0566
## Maternal.protectiveness -0.44520 0.6566
## Maternal.rejectiveness -0.60605 0.5452
## nSocial.play....of.time. 0.12857 0.8978
## Sex.of.the.offspring 2.77426 0.0060
## Year.of.birth 0.68080 0.4968
## PostGC 0.56541 0.5724
## Age.1:Early.mid.gestation.PreGC 1.33081 0.1847
## Age.1:Early.mid.gestation.FA -2.37470 0.0185
## Early.mid.gestation.PreGC:Early.mid.gestation.FA 2.26788 0.0244
## Age.1:nPostnatal.FA 4.26118 0.0000
## Age.1:Maternal.protectiveness -0.18967 0.8498
## Age.1:Maternal.rejectiveness -1.65208 0.1000
## Age.1:nSocial.play....of.time. -2.09852 0.0371
## Age.1:Sex.of.the.offspring 2.92882 0.0038
## Age.1:Year.of.birth -1.14633 0.2530
## Age.1:PostGC 0.10093 0.9197
## Age.1:Early.mid.gestation.PreGC:Early.mid.gestation.FA 2.16456 0.0316
##
## Correlation:
## (Intr) Age.1
## Age.1 0.214
## Early.mid.gestation.PreGC -0.034 -0.015
## Early.mid.gestation.FA -0.157 -0.205
## nPostnatal.FA 0.085 -0.794
## Maternal.protectiveness -0.238 -0.091
## Maternal.rejectiveness -0.155 0.063
## nSocial.play....of.time. -0.278 -0.255
## Sex.of.the.offspring 0.008 -0.145
## Year.of.birth 0.135 0.812
## PostGC -0.021 -0.113
## Age.1:Early.mid.gestation.PreGC -0.098 -0.010
## Age.1:Early.mid.gestation.FA -0.089 -0.325
## Early.mid.gestation.PreGC:Early.mid.gestation.FA 0.432 0.001
## Age.1:nPostnatal.FA -0.306 0.089
## Age.1:Maternal.protectiveness -0.050 -0.084
## Age.1:Maternal.rejectiveness -0.063 -0.192
## Age.1:nSocial.play....of.time. -0.135 0.132
## Age.1:Sex.of.the.offspring -0.101 0.131
## Age.1:Year.of.birth 0.505 0.275
## Age.1:PostGC 0.069 0.103
## Age.1:Early.mid.gestation.PreGC:Early.mid.gestation.FA 0.175 0.263
## Er...PGC E...FA
## Age.1
## Early.mid.gestation.PreGC
## Early.mid.gestation.FA 0.101
## nPostnatal.FA -0.070 0.075
## Maternal.protectiveness -0.508 0.414
## Maternal.rejectiveness -0.392 0.336
## nSocial.play....of.time. -0.203 0.327
## Sex.of.the.offspring -0.324 0.052
## Year.of.birth -0.131 0.075
## PostGC -0.879 0.157
## Age.1:Early.mid.gestation.PreGC 0.606 -0.052
## Age.1:Early.mid.gestation.FA -0.099 0.094
## Early.mid.gestation.PreGC:Early.mid.gestation.FA -0.292 -0.093
## Age.1:nPostnatal.FA 0.111 0.032
## Age.1:Maternal.protectiveness -0.329 0.140
## Age.1:Maternal.rejectiveness -0.255 0.171
## Age.1:nSocial.play....of.time. -0.250 0.184
## Age.1:Sex.of.the.offspring -0.303 0.149
## Age.1:Year.of.birth -0.340 0.002
## Age.1:PostGC -0.600 0.028
## Age.1:Early.mid.gestation.PreGC:Early.mid.gestation.FA -0.521 -0.198
## nPs.FA Mtrnl.p
## Age.1
## Early.mid.gestation.PreGC
## Early.mid.gestation.FA
## nPostnatal.FA
## Maternal.protectiveness -0.049
## Maternal.rejectiveness -0.137 0.508
## nSocial.play....of.time. -0.093 0.500
## Sex.of.the.offspring 0.048 0.347
## Year.of.birth -0.851 0.191
## PostGC 0.113 0.585
## Age.1:Early.mid.gestation.PreGC 0.091 -0.356
## Age.1:Early.mid.gestation.FA 0.286 0.175
## Early.mid.gestation.PreGC:Early.mid.gestation.FA 0.055 -0.095
## Age.1:nPostnatal.FA -0.237 -0.066
## Age.1:Maternal.protectiveness -0.040 0.302
## Age.1:Maternal.rejectiveness 0.092 0.280
## Age.1:nSocial.play....of.time. -0.162 0.381
## Age.1:Sex.of.the.offspring -0.186 0.320
## Age.1:Year.of.birth -0.010 0.190
## Age.1:PostGC -0.175 0.358
## Age.1:Early.mid.gestation.PreGC:Early.mid.gestation.FA 0.001 0.010
## Mtrnl.r nS....
## Age.1
## Early.mid.gestation.PreGC
## Early.mid.gestation.FA
## nPostnatal.FA
## Maternal.protectiveness
## Maternal.rejectiveness
## nSocial.play....of.time. 0.264
## Sex.of.the.offspring 0.306 0.466
## Year.of.birth 0.270 0.157
## PostGC 0.373 0.363
## Age.1:Early.mid.gestation.PreGC -0.159 -0.373
## Age.1:Early.mid.gestation.FA 0.116 0.245
## Early.mid.gestation.PreGC:Early.mid.gestation.FA -0.166 0.107
## Age.1:nPostnatal.FA 0.005 0.001
## Age.1:Maternal.protectiveness 0.202 0.391
## Age.1:Maternal.rejectiveness 0.165 0.332
## Age.1:nSocial.play....of.time. 0.246 0.515
## Age.1:Sex.of.the.offspring 0.152 0.463
## Age.1:Year.of.birth 0.151 0.122
## Age.1:PostGC 0.178 0.326
## Age.1:Early.mid.gestation.PreGC:Early.mid.gestation.FA -0.051 -0.117
## Sx.f.. Yr.f.b
## Age.1
## Early.mid.gestation.PreGC
## Early.mid.gestation.FA
## nPostnatal.FA
## Maternal.protectiveness
## Maternal.rejectiveness
## nSocial.play....of.time.
## Sex.of.the.offspring
## Year.of.birth 0.032
## PostGC 0.305 0.087
## Age.1:Early.mid.gestation.PreGC -0.378 -0.242
## Age.1:Early.mid.gestation.FA 0.192 -0.224
## Early.mid.gestation.PreGC:Early.mid.gestation.FA 0.347 0.058
## Age.1:nPostnatal.FA -0.146 0.246
## Age.1:Maternal.protectiveness 0.303 0.153
## Age.1:Maternal.rejectiveness 0.218 -0.010
## Age.1:nSocial.play....of.time. 0.390 0.286
## Age.1:Sex.of.the.offspring 0.244 0.284
## Age.1:Year.of.birth 0.232 0.365
## Age.1:PostGC 0.378 0.321
## Age.1:Early.mid.gestation.PreGC:Early.mid.gestation.FA 0.088 0.079
## PostGC Ag.1:E...PGC
## Age.1
## Early.mid.gestation.PreGC
## Early.mid.gestation.FA
## nPostnatal.FA
## Maternal.protectiveness
## Maternal.rejectiveness
## nSocial.play....of.time.
## Sex.of.the.offspring
## Year.of.birth
## PostGC
## Age.1:Early.mid.gestation.PreGC -0.640
## Age.1:Early.mid.gestation.FA 0.144 0.008
## Early.mid.gestation.PreGC:Early.mid.gestation.FA 0.297 -0.564
## Age.1:nPostnatal.FA -0.128 -0.016
## Age.1:Maternal.protectiveness 0.384 -0.577
## Age.1:Maternal.rejectiveness 0.311 -0.531
## Age.1:nSocial.play....of.time. 0.281 -0.266
## Age.1:Sex.of.the.offspring 0.367 -0.353
## Age.1:Year.of.birth 0.304 -0.370
## Age.1:PostGC 0.610 -0.894
## Age.1:Early.mid.gestation.PreGC:Early.mid.gestation.FA 0.399 -0.235
## A.1:E...F E...PGC:
## Age.1
## Early.mid.gestation.PreGC
## Early.mid.gestation.FA
## nPostnatal.FA
## Maternal.protectiveness
## Maternal.rejectiveness
## nSocial.play....of.time.
## Sex.of.the.offspring
## Year.of.birth
## PostGC
## Age.1:Early.mid.gestation.PreGC
## Age.1:Early.mid.gestation.FA
## Early.mid.gestation.PreGC:Early.mid.gestation.FA -0.076
## Age.1:nPostnatal.FA -0.215 0.019
## Age.1:Maternal.protectiveness 0.423 0.116
## Age.1:Maternal.rejectiveness 0.385 0.096
## Age.1:nSocial.play....of.time. 0.229 -0.037
## Age.1:Sex.of.the.offspring 0.040 0.132
## Age.1:Year.of.birth 0.321 0.165
## Age.1:PostGC 0.145 0.462
## Age.1:Early.mid.gestation.PreGC:Early.mid.gestation.FA -0.123 0.419
## A.1:P. Ag.1:Mtrnl.p
## Age.1
## Early.mid.gestation.PreGC
## Early.mid.gestation.FA
## nPostnatal.FA
## Maternal.protectiveness
## Maternal.rejectiveness
## nSocial.play....of.time.
## Sex.of.the.offspring
## Year.of.birth
## PostGC
## Age.1:Early.mid.gestation.PreGC
## Age.1:Early.mid.gestation.FA
## Early.mid.gestation.PreGC:Early.mid.gestation.FA
## Age.1:nPostnatal.FA
## Age.1:Maternal.protectiveness -0.114
## Age.1:Maternal.rejectiveness -0.133 0.535
## Age.1:nSocial.play....of.time. -0.379 0.422
## Age.1:Sex.of.the.offspring -0.043 0.320
## Age.1:Year.of.birth -0.497 0.420
## Age.1:PostGC 0.044 0.608
## Age.1:Early.mid.gestation.PreGC:Early.mid.gestation.FA 0.034 -0.170
## Ag.1:Mtrnl.r
## Age.1
## Early.mid.gestation.PreGC
## Early.mid.gestation.FA
## nPostnatal.FA
## Maternal.protectiveness
## Maternal.rejectiveness
## nSocial.play....of.time.
## Sex.of.the.offspring
## Year.of.birth
## PostGC
## Age.1:Early.mid.gestation.PreGC
## Age.1:Early.mid.gestation.FA
## Early.mid.gestation.PreGC:Early.mid.gestation.FA
## Age.1:nPostnatal.FA
## Age.1:Maternal.protectiveness
## Age.1:Maternal.rejectiveness
## Age.1:nSocial.play....of.time. 0.317
## Age.1:Sex.of.the.offspring 0.408
## Age.1:Year.of.birth 0.315
## Age.1:PostGC 0.485
## Age.1:Early.mid.gestation.PreGC:Early.mid.gestation.FA -0.104
## A.1:S.... Ag.1:S...
## Age.1
## Early.mid.gestation.PreGC
## Early.mid.gestation.FA
## nPostnatal.FA
## Maternal.protectiveness
## Maternal.rejectiveness
## nSocial.play....of.time.
## Sex.of.the.offspring
## Year.of.birth
## PostGC
## Age.1:Early.mid.gestation.PreGC
## Age.1:Early.mid.gestation.FA
## Early.mid.gestation.PreGC:Early.mid.gestation.FA
## Age.1:nPostnatal.FA
## Age.1:Maternal.protectiveness
## Age.1:Maternal.rejectiveness
## Age.1:nSocial.play....of.time.
## Age.1:Sex.of.the.offspring 0.490
## Age.1:Year.of.birth 0.547 0.269
## Age.1:PostGC 0.294 0.285
## Age.1:Early.mid.gestation.PreGC:Early.mid.gestation.FA -0.034 0.251
## A.1:Y. A.1:PG
## Age.1
## Early.mid.gestation.PreGC
## Early.mid.gestation.FA
## nPostnatal.FA
## Maternal.protectiveness
## Maternal.rejectiveness
## nSocial.play....of.time.
## Sex.of.the.offspring
## Year.of.birth
## PostGC
## Age.1:Early.mid.gestation.PreGC
## Age.1:Early.mid.gestation.FA
## Early.mid.gestation.PreGC:Early.mid.gestation.FA
## Age.1:nPostnatal.FA
## Age.1:Maternal.protectiveness
## Age.1:Maternal.rejectiveness
## Age.1:nSocial.play....of.time.
## Age.1:Sex.of.the.offspring
## Age.1:Year.of.birth
## Age.1:PostGC 0.413
## Age.1:Early.mid.gestation.PreGC:Early.mid.gestation.FA 0.132 0.208
##
## Standardized residuals:
## Min Q1 Med Q3 Max
## -3.45668056 -0.56298375 0.02581968 0.58193303 2.80063984
##
## Residual standard error: 90.10255
## Degrees of freedom: 227 total; 205 residual
The values are a little off (example: I got -28 for age and mid-gestation food availabtilty while the authors got -24). But let’s look at the R squared value:
library(piecewiseSEM)
rsquared(m3)
## Response family link method R.squared
## 1 Body.size.index gaussian identity none 0.944954
0.94, just like the authors found! This is good, but let’s look at the residuals distribution (again, because the authors checked with each model):
hist(residuals(m3))
qqPlot(m3$residuals)
## [1] 64 19
Looks pretty normal!
Now I am going to attempt to recreate the plot that the authors made. They plotted the estimated coefficients for age against early to mid PreGC levels. The output of the model was used to build a function in order to get these estimated values (the actual values of PreGC are between -1 and 1, they extend it to -2 and 2). The coefficient for age is 347.6 in my output (compared to their 339.7), meaning that when the PreGC is at 0, this is the age coefficient. The coefficient for the interaction between PreGC and age is 29.5 (way lower than the 57.8 that they found). Therefore, when preGC increases by one unit, the age coefficient increases by 29.5 units. I can use these values to build the simple linear function y=29.5x+347.6.
ggplot(data=data.frame( x=c(-2,2),y=c(200,500) ), aes(x=x,y=y)) +
geom_blank() +
geom_abline(slope = 29.5, intercept = 347.6) +
labs(y= "Estimated Coefficient for Age", x = "Early-mid Gestation PreGC")
Mine looks different from theirs because the slope is practically half of what they found.
With these results, we are able to say that PreGC was a good predictor of offspring growth rate, and that growth rate was positively correlated with PreGC.
Here, the authors run a GLS to see if prenatal stress impacts the size of the offspring at ages 16-18 months. They also control for the specific age that the body measurement was made at, as well as average PostGC during lactation. However, it also looks like they included the additional variables in the model, as they provide estimates for each in Figure 1 (d). Here’s what that would look like:
Load the data:
f <- curl("https://raw.githubusercontent.com/MelZarate/mazarate-Berghanel2016-replication-assignment/master/model_4.csv")
mod_4 <- read.csv(f, header = TRUE, sep = ",", stringsAsFactors = TRUE)
head(mod_4)
## Body.size.index Age Early.mid.gestation.PreGC
## 1 893.8059 467.8333 0.400
## 2 1246.3803 492.0000 0.400
## 3 1424.9866 537.0000 0.400
## 4 1197.2567 468.9231 0.017
## 5 1370.0400 489.0000 0.017
## 6 1398.0732 527.0000 0.017
## Early.mid.gestation.FA nPostnatal.FA Maternal.protectiveness
## 1 1.131 -1.449 -1.359
## 2 1.131 -0.980 -1.359
## 3 1.131 -0.004 -1.359
## 4 0.920 -1.219 1.060
## 5 0.920 -0.835 1.060
## 6 0.920 0.025 1.060
## Maternal.rejectiveness nSocial.play....of.time. Age.1
## 1 0.208 0.884 -1.039
## 2 0.208 0.868 -0.162
## 3 0.208 0.982 1.472
## 4 -0.120 -0.373 -0.999
## 5 -0.120 -0.482 -0.270
## 6 -0.120 -0.416 1.109
## Sex.of.the.offspring PostGC Infant.ID
## 1 -1.045 1.239 fin
## 2 -1.045 1.239 fin
## 3 -1.045 1.239 fin
## 4 0.929 -0.705 gar
## 5 0.929 -0.705 gar
## 6 0.929 -0.705 gar
summary(mod_4)
## Body.size.index Age Early.mid.gestation.PreGC
## Min. : 893.8 Min. :456.0 Min. :-2.2150000
## 1st Qu.:1186.4 1st Qu.:470.4 1st Qu.:-0.3320000
## Median :1286.4 Median :492.0 Median : 0.0170000
## Mean :1301.0 Mean :496.5 Mean : 0.0000294
## 3rd Qu.:1424.4 3rd Qu.:523.7 3rd Qu.: 0.3842500
## Max. :1680.3 Max. :542.0 Max. : 2.2230000
##
## Early.mid.gestation.FA nPostnatal.FA Maternal.protectiveness
## Min. :-1.4480000 Min. :-1.9950000 Min. :-1.7080000
## 1st Qu.:-0.9807500 1st Qu.:-0.8350000 1st Qu.:-0.2862500
## Median :-0.0220000 Median :-0.0050000 Median :-0.1100000
## Mean :-0.0001471 Mean :-0.0000882 Mean : 0.0000588
## 3rd Qu.: 0.9200000 3rd Qu.: 0.8430000 3rd Qu.: 0.5097500
## Max. : 1.4690000 Max. : 1.7610000 Max. : 2.1210000
##
## Maternal.rejectiveness nSocial.play....of.time. Age.1
## Min. :-1.6690000 Min. :-1.0290000 Min. :-1.4680000
## 1st Qu.:-1.0560000 1st Qu.:-0.6870000 1st Qu.:-0.9440000
## Median : 0.2905000 Median :-0.3935000 Median :-0.1620000
## Mean :-0.0001471 Mean : 0.0000294 Mean :-0.0000588
## 3rd Qu.: 0.7805000 3rd Qu.: 0.4680000 3rd Qu.: 0.9890000
## Max. : 1.3380000 Max. : 2.2910000 Max. : 1.6530000
##
## Sex.of.the.offspring PostGC Infant.ID
## Min. :-1.0450000 Min. :-0.9640 fin : 3
## 1st Qu.:-1.0450000 1st Qu.:-0.6182 gar : 3
## Median : 0.9290000 Median :-0.3520 hop : 3
## Mean : 0.0000588 Mean : 0.0000 iri : 3
## 3rd Qu.: 0.9290000 3rd Qu.: 0.1715 kim : 3
## Max. : 0.9290000 Max. : 2.6070 mii : 3
## (Other):16
Build the model:
m4 <- gls(Body.size.index ~ Early.mid.gestation.PreGC + Early.mid.gestation.FA + nPostnatal.FA + Maternal.protectiveness + Maternal.rejectiveness + nSocial.play....of.time. + Sex.of.the.offspring + Age.1 + PostGC, data = mod_4) #Age.1 is the z-transformation of the variable.
plot(m4)
summary(m4)
## Generalized least squares fit by REML
## Model: Body.size.index ~ Early.mid.gestation.PreGC + Early.mid.gestation.FA + nPostnatal.FA + Maternal.protectiveness + Maternal.rejectiveness + nSocial.play....of.time. + Sex.of.the.offspring + Age.1 + PostGC
## Data: mod_4
## AIC BIC logLik
## 339.0395 351.9981 -158.5198
##
## Coefficients:
## Value Std.Error t-value p-value
## (Intercept) 1300.9560 17.81085 73.04289 0.0000
## Early.mid.gestation.PreGC 203.0351 68.80264 2.95098 0.0070
## Early.mid.gestation.FA -293.1991 152.61955 -1.92111 0.0667
## nPostnatal.FA -327.8248 204.34568 -1.60427 0.1217
## Maternal.protectiveness -34.6652 38.07323 -0.91049 0.3716
## Maternal.rejectiveness -12.3313 23.70758 -0.52014 0.6077
## nSocial.play....of.time. -45.5787 34.30227 -1.32874 0.1964
## Sex.of.the.offspring 30.4815 29.98789 1.01646 0.3195
## Age.1 259.3535 115.51914 2.24511 0.0342
## PostGC -106.7996 55.06557 -1.93950 0.0643
##
## Correlation:
## (Intr) E...PG E...FA nPs.FA Mtrnl.p Mtrnl.r
## Early.mid.gestation.PreGC -0.001
## Early.mid.gestation.FA 0.002 -0.475
## nPostnatal.FA 0.002 -0.541 0.981
## Maternal.protectiveness 0.001 -0.735 0.325 0.300
## Maternal.rejectiveness 0.000 -0.315 0.096 0.039 0.438
## nSocial.play....of.time. 0.000 -0.424 0.048 -0.002 0.654 0.346
## Sex.of.the.offspring 0.000 -0.423 0.166 0.138 0.613 0.407
## Age.1 -0.002 0.547 -0.969 -0.987 -0.315 -0.044
## PostGC 0.001 -0.930 0.429 0.473 0.760 0.257
## nS.... Sx.f.. Age.1
## Early.mid.gestation.PreGC
## Early.mid.gestation.FA
## nPostnatal.FA
## Maternal.protectiveness
## Maternal.rejectiveness
## nSocial.play....of.time.
## Sex.of.the.offspring 0.747
## Age.1 -0.039 -0.156
## PostGC 0.466 0.408 -0.479
##
## Standardized residuals:
## Min Q1 Med Q3 Max
## -1.9487112 -0.5547247 -0.1939383 0.7198240 1.7619102
##
## Residual standard error: 103.854
## Degrees of freedom: 34 total; 24 residual
#checking the residuals...
hist(residuals(m4))
qqPlot(m4$residuals)
## [1] 1 34
Estimates check out. The residuals are a little wacky but I’ll roll with it, now the R squared:
rsquared(m4) #finding it for the model including other variables
## Response family link method R.squared
## 1 Body.size.index gaussian identity none 0.7461494
For this model, my R squared is a bit off (.746 compared to the authors’ .651). This is a bit strange because for model 3, my estimates were more off, but the R squared was the same. For model 4, the opposite is true. Let’s see if I can get a similar looking plot, using infant ID as a grouping variable:
ggplot(data=mod_4,aes(y=Body.size.index,x=Early.mid.gestation.PreGC))+geom_point(data=mod_4,aes(fill=Infant.ID), shape=21)+stat_smooth(method="lm")+theme_bw()
This is different than the authors’ figure (they used the residuals of the model produced by each variable), but still shows the positive effect it has on body size at 16-18 months of age.
Let’s look into a inferential snalysis…
The authors wanted to see how different types of mother-infant inteactions during lactation belong to independent style dimensions. They did this on SPSS but I’m going to try it out here.
What’s in the data: The Hinde index is the difference between the proportion of approaches and departures of the mother to asses the mother’s maintenance of a 1.5 m proximity to her infant. Also included are age of nipple refusal, aggression, restraint, carrying, body contact and clasping. It is easy to be able to conceptually place each of these variables in to “protectiveness” or “rejectiveness” (this is what I have been using in the previous models), but the PCA is just a way to show that these varibles can be placed into those categories.
Data:
f <- curl("https://raw.githubusercontent.com/MelZarate/mazarate-Berghanel2016-replication-assignment/master/PCA_data.csv")
mat_pca <- read.csv(f, header = TRUE, sep = ",", stringsAsFactors = TRUE)
head(mat_pca)
## Close.proximity..average.duration. Body.contact..average.duration.
## 1 -32.413717 -23.2302272
## 2 7.260602 -6.8874291
## 3 3.308484 5.4312788
## 4 30.756496 30.1727280
## 5 23.604100 30.2245084
## 6 7.969163 0.5489075
## Body.contact..total.time. Clasping......of.time. Carrying......of.time.
## 1 -30.3061293 -47.792107 -12.85685
## 2 -19.0462407 22.841888 -11.90272
## 3 -3.0286044 4.041515 17.62762
## 4 24.3906355 49.823883 42.46480
## 5 5.2285686 1.051126 18.12859
## 6 -0.1420823 -23.761770 25.26355
## Restrain.rate Hinde.index.mother..proximity. Aggression.rate
## 1 -35.30936 -11.969445 63.600378
## 2 40.89176 -21.595003 -2.189308
## 3 -19.22368 -6.500814 5.923225
## 4 118.61850 70.387857 -46.247915
## 5 -29.49284 -5.143914 66.422813
## 6 -61.37279 -28.810581 36.761633
## Age.of.refused.nipple.contact
## 1 -12.78013
## 2 -36.03876
## 3 4.66384
## 4 102.78619
## 5 -24.04603
## 6 24.28831
In the caption of the author’s table, they state the p value of the Bartlett’s test. This test determines if the samples are from populations with equal variances. If all variances were equal, then a PCA would be inappropriate with this data. The p value the author’s got doing this on SPSS was below 0.001, but here’s how we would run this on r:
bart<-function(mat_pca){ #object will be the function of the raw data
R<-cor(mat_pca)
p<-ncol(mat_pca)
n<-nrow(mat_pca)
chi2<- -((n-1)-((2*p)+5)/6 ) * log(det(R)) #this is the formula
df<-(p*(p-1)/2)
crit<-qchisq(.95,df) #critical value using chi squared
p<-pchisq(chi2,df,lower.tail=F) # to get the pvalue
cat("Bartlett's test of sphericity: X2(",
df,")=",chi2,", p=",
round(p,3),sep="" )
}
bart(mat_pca) #p=0
## Bartlett's test of sphericity: X2(36)=87.50931, p=0
According to this test, a PCA is indeed appropriate. I’ll try it out using the psych package. The authors also state in the caption “Cut-off value=0.4,” so a 40% differentiation or higher will be shown. They also state KMO=0.716. This stands for Kaiser, Meyer, Olkin (KMO) Measure of Sampling Adequacy, which is just another test to make sure the data is suited for a PCA. It measures sampling accuracy for each variable in the model and for the complete model. The value itself is a measure of the proportion of variance among variables that might be common variance. The psych package has a function to compute this for me:
library(psych)
##
## Attaching package: 'psych'
## The following object is masked from 'package:car':
##
## logit
## The following objects are masked from 'package:ggplot2':
##
## %+%, alpha
KMO(mat_pca)
## Kaiser-Meyer-Olkin factor adequacy
## Call: KMO(r = mat_pca)
## Overall MSA = 0.72
## MSA for each item =
## Close.proximity..average.duration. Body.contact..average.duration.
## 0.82 0.77
## Body.contact..total.time. Clasping......of.time.
## 0.82 0.83
## Carrying......of.time. Restrain.rate
## 0.68 0.80
## Hinde.index.mother..proximity. Aggression.rate
## 0.69 0.27
## Age.of.refused.nipple.contact
## 0.40
Here, the MSA is the value I’m looking for, and it is the same as what the authors found. It is high enough that I can feel okay running a PCA on the data.
Now that we know that the variance in the data is suitable for a PCA, let’s run the thing! I’m going to
pca<-principal(mat_pca,nfactor=2,rotate="none") #extracting 2 components
pca
## Principal Components Analysis
## Call: principal(r = mat_pca, nfactors = 2, rotate = "none")
## Standardized loadings (pattern matrix) based upon correlation matrix
## PC1 PC2 h2 u2 com
## Close.proximity..average.duration. 0.91 0.32 0.92 0.075 1.2
## Body.contact..average.duration. 0.90 0.06 0.81 0.189 1.0
## Body.contact..total.time. 0.84 -0.15 0.73 0.270 1.1
## Clasping......of.time. 0.79 0.13 0.64 0.363 1.1
## Carrying......of.time. 0.79 0.18 0.65 0.349 1.1
## Restrain.rate 0.64 -0.13 0.43 0.573 1.1
## Hinde.index.mother..proximity. 0.55 -0.16 0.33 0.667 1.2
## Aggression.rate -0.05 0.86 0.74 0.264 1.0
## Age.of.refused.nipple.contact 0.32 -0.74 0.66 0.344 1.4
##
## PC1 PC2
## SS loadings 4.40 1.50
## Proportion Var 0.49 0.17
## Cumulative Var 0.49 0.66
## Proportion Explained 0.75 0.25
## Cumulative Proportion 0.75 1.00
##
## Mean item complexity = 1.1
## Test of the hypothesis that 2 components are sufficient.
##
## The root mean square of the residuals (RMSR) is 0.12
## with the empirical chi square 19.11 with prob < 0.45
##
## Fit based upon off diagonal values = 0.93
Looks like I got all of the same values! The PC1 and PC2 can be defined as “protectiveness” and “rejectiveness,” respectively. The reason why they don’t have the PC1 values for aggression rate and age of refused nipple contact is because they fell below that .4 cut-off value. The same is true for the reason why they don’t show the PC2 values for all of the other variables. If you look at the variables themselves, this distinction makes complete sense, as aggression rate and age of refused nipple are the only “rejective” attributes being tested here. Therefore, the PCA does show distinction between the mothering styles.
Because I like to visualize these things, I am just going to use the base plot() function to look at this:
plot(pca)
As we could expect, variable 8 and 9 are excluded from the other variables (represented by blue dots). This is how the authors decided to split the two components in to protectiveness and rejectiveness to be used throughout the analysis of the data.
From their analyses, Berghanel et al. (2016) were able to conclude that prenatal stress (preGC) has strong effects on offspring developmental phenotype. With this, they found through the first model that stress is related to reduced maternal physiological condition due to a lack of food availabilty. This most likely resulted in low energy intake by the offspring during gestation, causing reduced growth rates early in life and supporting the developmental constraint hypothesis. The authors also discus the idea that mothers may compensate for this with increased maternal protectives, but did not test for this.
The most difficult part about the reproduction of these authors’ results was their use of the word “control.” They stated that they controlled for variables and then included them in their models. This is why, in the first model, I took an extra step to regress out these control variables. After some time, however, it made more sense to me to include them, conceptually, they could have an impact on the output (example: maternal rank could impact PreGC levels because higher ranked females may get more food than others). Therefore, I support their decision to include (most of) these “control” variables in the models, but think that different terminology should have been used. The authors also left out any visualization of the second model, potentially because the plots do not show very strong correlation. I believe that the use of the log-transformed PreGC variable would have better to use in this model, and they do not explain why they transform this variable in the first model but not the second. They also leave out the reasoning behind plotting the coefficients of the models instead of the variables themselves. The regression coefficients, however, well analyzed in the results of the paper and it is safe to say that they were able to support their hypothese with an array of proper analyses.
Berhanel, A., Heistermann, M., Schulke, O. and Ostner, J. (2016). Prenatal stress effects in a wild, long-lived primate: predictive adaptive responses in an unpredictable environment. Proc R. Soc. B, 283, 20161304.